Farsi/Arabic Document Image Retrieval through Sub -Letter Shape Coding for mixed Farsi/Arabic and English text

نویسندگان

  • Zahra bahmani
  • Reza Azmi
چکیده

A retrieval method for explicit recognition free Farsi/Arabic document is proposed in this paper. The system can be used in mixed Farsi/Arabic and English text. The method consists of Preprocessing, word and sub_word extraction, detection and cancelation of sub_letter connectors, annotation sub_letters by shape coding, classifier of sub_letters by use of decision tree and using of RBF neural network for sub_letter recognition. The Proposed system retrieves document images by a new sub_letter shape coding scheme in Farsi/Arabic documents. In this method document content captures through sub_letter coding of words. The decision tree-based classifier partitions the sub_letters space into a number of sub regions by splitting the sub_letter space, using one topological shape features at a time. Topological shape Features include height, width, holes, openings, valleys, jags, sub_letter ascenders/descanters. Experimental results show advantages of this method in Farsi/Arabic Document Image Retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Gabor-Based Features for Writer Identification of Farsi/Arabic Handwriting

Writer identification recently has been studied and it has a wide variety of applications. Most studies are based on English documents with the assumption that the written text is fixed (text-dependent methods) and no research has been reported on Farsi or Arabic documents. In this paper, we have proposed a method for off-line writer identification based on Farsi handwriting, which is text-inde...

متن کامل

Intuitive Coding of the Arabic Lexicon

SYSTRAN started the design and the development of Arabic, Farsi and Urdu to English machine translation systems in July 2002. This paper describes the methodology and implementation adopted for dictionary building and morphological analysis. SYSTRAN’s IntuitiveCoding® technology (ICT) facilitates the creation, update, and maintenance of Arabic, Farsi and Urdu lexical entries, is more modular an...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

آشکارسازی و تعیین مکان متون فارسی - عربی در تصاویر ویدیویی

Video text detection plays an important role in applications such as semantic-based video analysis, text information retrieval, archiving and so on. In this paper, we propose a Farsi/Arabic text detection approach. First, with an appropriate edge detector, edges are extracted and then by using edges cross ponts, artificial corners are extracted. Artificial corner histogram analysis is done for ...

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011